05_1_A_Benchmark_Methods_FittedVals

References

  1. Hyndman, R.J., & Athanasopoulos, G., 2021. Forecasting: principles and practice. 3rd edition.
  2. Fable package documentation

Libraries

library(fpp3)

Introduction

In this session we are going to:

  • Explore some benchmark models. These are models that are simple in nature, but yet will serve as a measure of performance to which we will compare more complex models.
    • If more complex models do not outperform these benchmarks, their complexity will not be justified.
  • Introduce the concept of fitted values. and compare it to forecasts.
  • Practice the basic steps behind fitting models and forecasting with the library fable.
  • Do some exercises.

For most of the simple examples in this notebook we will work with this time series studying the production of bricks between 1970 and 2004:

bricks <- aus_production %>%
  filter_index("1970 Q1" ~ "2004 Q4") %>%
  select(Bricks)

head(bricks)
# A tsibble: 6 x 2 [1Q]
  Bricks Quarter
   <dbl>   <qtr>
1    386 1970 Q1
2    428 1970 Q2
3    434 1970 Q3
4    417 1970 Q4
5    385 1971 Q1
6    433 1971 Q2

Fitted values vs forecasts

Before using a model, we will need to fit that model (classic math terminology). This process is also known as training the model (machine learning terminology). This means that we will use data to produce estimates of the parameters of the model. We will call this data training data.

Note a few important technical details:

  • I talk about estimates of the parameters because we are estimating these parameters from a particular sample or in our case, from a particular realisation of the time series.
    • Notation:
      • \(\beta\) - true value of the parameter.
      • \(\hat{\beta}\) - estimate of the parameter.
    • Many times we will abuse this notation and write \(\beta\) instead of \(\hat{\beta}\), but know that, when fitting a model, we are producing estimates of the parameters.
  • If we fitted the model to another sample or to another realization of the time series (think of this as repeating an experiment), we would produce different estimates of those parameters.
  • Being estimates, those parameters have associated standard errors.

Fitted values - example with linear regression

Now let us resort to your knowledge of linear regression to ilustrate the concept of fitted values. When fitting a linear regression model to predict a variable \(y\) based on the values of a predictor \(x\), we take a sample of data (our training data) and produce estimates of the model parameters.

Once these estimates \(\hat{\beta_0}\) and \(\hat{\beta_1}\) have been produced, we can compute the values of \(y\) predicted by the model for the specific values of \(x\) of the training data. The values of \(y\) returned by the model would be the fitted values \(\hat{y_t}\) of the linear regression model.

At each point there is a difference between the actual value of the training data \(y\) and the fitted value \(\hat{y_t}\). This is what we call the random error. The word error here should not be understood as mistake, but rather as anything that may affect \(y_t\) other than the linear effec of \(x_t\)

\[ y_t - \hat{y_t} = \varepsilon_t \]

The image below illustrates this:

Time Series case - Fitted values and forecasts

The following figure will serve to clarify the difference between the concepts of fitted values and forecasts. This figure summarizes the forecasting cast and is very relevant for both this and the second part of the course:

In this figure:

  • The black dots represent the available observations at the time of forecasting \(y_t\). These extend fron \(t=1\) to \(t=T\)
  • The blue dots represent the fitted values, values returned by the model we have fitted in the training region producing estimates of the model parameters.
    • These fitted values are defined as one step ahead forecasts on the training data: \(y_{t|t-1}\). More on this later.
    • The observations minus the fitted values are the residuals of the model.
  • The green dots represent the forecasts beyond the available data, at a horizon h: \(y_{T+h|T}\).
    • These forecasts are produced by the model using the data in the training region from \(t=1\) to \(t=T\).
    • The forecasts extend betond the training region. Hence \(y_{T+h|T}\) meaning: forecast at \(t=T+h\) given all the information up to \(t=T\).
  • The orange dots represent future observations not available at the time of forecasting. Once these are available, we could compute the forecast errors.

More on these concepts later on in the notebook

Forecasts - \(\hat{y}_{T+h|T}\)

When fitting a time series model to a given realization of a time series - that is, to our training data - the goal is to produce estimates of the model parameters. Once the model is fitted (once we have those estiamtes), we may produce forecasts beyond the training data. Let us assume that:

  • The training data starts at \(t=1\) and ends at \(t=T\).
  • The forecasts are produced for a horizon \(h\).

The notation for the forecast at \(T+h\) given everything that has occurred up to time \(T\) (end of the training data) is:

\[ \begin{align*} \text{Forecast at time T+h based on what has happened up to time T:} && \hat{y}_{T+h|T} \end{align*} \]

Fitted values - \(\hat{y}_{t|t-1}\)

Fitted values are one-step in-sample (i.e., in the time region of the training data) “forecasts” (we will see some exceptions to this definition):

  • one-step ahead means that the forecast only extend one time step into the future (\(h=1\)).
    • To produce the one-step ahead forecast at time \(t\), only information up to \(t-1\) is considered.
  • in-sample means that the forecasts belong to the time-region of the training data (training data = the time series to which we fit the model).

\[ \begin{align*} \text{Fitted value at time t based on what has happened up to time t-1:} && \hat{y}_{t|t-1} \end{align*} \]

Given their definition as one-step ahead forecasts, we denote the fitted value at time \(t\) as \(\hat{y}_{t|t-1}\), meaning the forecast is based on observations up to \(t-1\), i.e. {\(y_{1},\dots,y_{t-1}\)}.

We will abuse notation and sometimes refer to the fitted values simply as \(\hat{y_t}\) instead of \(\hat{y}_{t|t-1}\).

In later sessions we will see that, for many models, the equation for the fitted values will be obtained from the equation for a forecast at a horizon \(h\) as follows:

\[ \begin{align*} \text{Forecast at a horizon h given information up to t=T}: && \hat{y}_{T+h|T} \\ \text{Particularize at h = 1}: && \hat{y}_{T+1|T} \\ \text{Particularize equation at t instead of T+1}: && \hat{y}_{t|t-1} \end{align*} \]

The last change in the equation (particularize at \(t\) instead of \(T+1\)) is simply substituting \(T\) by \(t-1\). That is, we are using the change of variable \(t=T+1 \rightarrow T=t-1\). Hence:

\[ \begin{equation} \hat{y}_{T+1|T} \underset{\substack{\uparrow \\ \text{C.V:} \\ T = t-1}}{=} \hat{y}_{t-1+1|t-1} = \hat{y}_{t|t-1} \end{equation} \]

The graph below shows an example of all these concepts:

  • The black line represents the bricks time series.
  • The dashed red line represents the fitted values resulting from fitting the Naive model to this time series. As we will see later, these are one-step ahead forecasts.
  • The blue line represents the forecasts given by the Naive model fitted to the data. These extend several time steps into the future.
    • Confidence regions of 80 and 95% are also depicted.

Forecasts compared to fitted values - summary

The table below summarizes the comparison between the concepts of fitted values vs. forecasts that we explained in the foregoing section:

Property Forecasts - \(\hat{y}_{T+h|T}\) Fitted values - \(\hat{y}_{t|t-1}\)
Definition Forecast at \(T+h\) given everything that occurred up to \(T\) Fitted value at \(t\) given everything that occurred up to \(t-1\)
Range Extend beyond the range of time of the training data (beyond \(T\)) Limited to the time range corresponding to the training data (\(t \leq T\))
Steps Can be multi-step, that is, for a horizon \(h>1\) Are one-step ahead (\(h=1\))

Importantly sometimes fitted values will not be strictly one-step ahead forecasts (for example in the Mean model). When we encounter examples of this, I will mention it explicitly. But the definition holds for the general case.

Point Forecast and Forecast Distribution

To take a statistical approach to the forecasting task and be able to account for the uncertainty, it is best to cosider a time series as a collection of random variables indexed over time. A collection of random variables indexed over time is a type of stochastic process (there are many possible structures for a stochastic process, time series is just one of them).

  • Past values of the time series {\({y_1, y_2, \dots, y_T}\)} are random variables that have been realized, that is, they have taken a specific outcome, a number, they have collapsed to a single value.
  • Future values of the time series, the forecasts {\(\hat{y}_{T+h|T}\)}, are random variables that have yet to be realized.

Let us look at the following example of the international arrivals to australia, in which a model has been fitted and some forecasts have been produced:

If we focus on the specific forecast region, each point \(T+h\) in the future has an associated forecast distribution:

More technically, at each point \(T+h\) in the future corresponding to the forecast region we have:

  • \(y_{T+h|T}\), the random variable at \(T+h\) given the information available up to time \(T\)
  • A forecast distribution associated to the random variable \(y_{T+h|T}\) that describes the probability of occurrence of specific values.
  • A point forecast \(\hat{y}_{T+h|T}\) that is the mean of the forecast distribution (the median is also used sometimes).

The most important concepts signaled here are:

  • Forecast distribution as the probability density function describing the probabilities associated to the set of values that the forecast \(y_{T+h|T}\) may take.
  • Point forecast \(\hat{y}_{T+h|T}\) as the mean value of the forecast distribution.

As a side note, a collection of random variables {\(y_t\)} indexed by \(t\) is a type of stochastic process.

Residuals

The residuals are the difference between the observations and the corresponding fitted values. At a specific point \(t\), the value of the residuals is:

\[ e_{t} = y_{t}-\hat{y}_{t}. \]

Innovation residuals

These are the residuals in the transformed domain, only relevant if a transformation has been used. If no transformation has been used, residuals and innovation residuals will match.

Let us look as an example. If we use a logarithmic transforamtion on the data:

  • Original data: \(y_t\)
  • Transformed data: \(w_t = log(y_t)\)

We would fit the model in the transformed domain, to our transformed data, hence generating

  • Fitted values in the transformed domain: \(\hat{w}_{t|t-1} \underset{\underset{\text{Notation}}{\uparrow}}{=} \hat{w_t}\)
  • Residuals in the transformed domain: \(w_t - \hat{w_t}\)

By reversing the transformation (backtransforming) we would obtain:

  • Fitted values in the transformed domain: \(\hat{y_t} = e^{\hat{w}_t}\)
  • Residuals in the transformed domain: \(y_t - \hat{y_t}\)

We will see later in this notebook that, in the fpp3 library, we can obtain both the innovation residuals and the residuals using the funciton augment()-

Mean model

Forecasts of all future values are equal to the average of the historical data.

\[ \hat{y}_{T+h|T} = \bar{y} = (y_1 + y_2 + y_3 +\dots + y_{T})/T. \]

Defining and training the model (generating the fitted values or estimates)

fit <- 
  bricks %>% 
  model(
    mean = MEAN(Bricks)
    )
fit
# A mable: 1 x 1
     mean
  <model>
1  <MEAN>

This particular mable (model table) has 1 row (1 time series fitted) and 1 column (1 model)

Fitted values

The function augment() allows us to extract some relevant information contained within the model objects generated by the fpp3 library. Among other things, fitted values and residuals. Note that the model also contains the original time series (column Bricks).

fitted_vals <- 
  fit %>% 
  augment()

head(fitted_vals)
# A tsibble: 6 x 6 [1Q]
# Key:       .model [1]
  .model Quarter Bricks .fitted .resid .innov
  <chr>    <qtr>  <dbl>   <dbl>  <dbl>  <dbl>
1 mean   1970 Q1    386    451.  -64.9  -64.9
2 mean   1970 Q2    428    451.  -22.9  -22.9
3 mean   1970 Q3    434    451.  -16.9  -16.9
4 mean   1970 Q4    417    451.  -33.9  -33.9
5 mean   1971 Q1    385    451.  -65.9  -65.9
6 mean   1971 Q2    433    451.  -17.9  -17.9

The mean model is a good example in which fitted values are not one-step ahead forecasts on the training data, because it actually uses data beyond the point at which the fitted value is computed to produce the fitted values.

  • That is, the fitted value at every point is computed using all the available data at \(t=1, 2, 3, \cdots, T\) and computing the mean \((y_1 + y_2 + y_3 +\dots + y_{T})/T\).
  • As an example, the fitted value at \(t=3\) (\(\hat{y_{3|2}}\)) is not computed solely using \(y_1\) and \(y_2\) as the strict definition says, but rather all the information from \(t=1, 2, 3, \cdots, T\). That is way the mean model is an exception in terms of the definition of fitted values.
fitted_vals %>% 
  filter(.model == "mean") %>%
  autoplot(Bricks, colour = "gray") +
  geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")

Forecasts

To produce forecasts in the fable library we use the forecast() function on the fitted model. We specify an argument: h, the number of time steps we wish to forecast.

# produce forecasts for 8 timesteps
forecasts <- fit %>% forecast(h = 8)
forecasts
# A fable: 8 x 4 [1Q]
# Key:     .model [1]
  .model Quarter       Bricks .mean
  <chr>    <qtr>       <dist> <dbl>
1 mean   2005 Q1 N(451, 4022)  451.
2 mean   2005 Q2 N(451, 4022)  451.
3 mean   2005 Q3 N(451, 4022)  451.
4 mean   2005 Q4 N(451, 4022)  451.
5 mean   2006 Q1 N(451, 4022)  451.
6 mean   2006 Q2 N(451, 4022)  451.
7 mean   2006 Q3 N(451, 4022)  451.
8 mean   2006 Q4 N(451, 4022)  451.

The output is a so called fable (forecast table). Some comments about the result:

  • The column .mean represents the forecast for each specific horizon \(h=1, 2, \dots\). That is, the point forecast \(\hat{y}_{T+h|T}\). The notation .mean is used because it refers to the mean of the forecast distribution as was explained earlier in this session.
  • The column Bricks (or whatever the variable name is) in the resulting fable contains a distribution object detailing the forecast distribution at each forecast horizon.
    • For the case at hand we can see that hese are normal distributions with \(\mu =451\) and \(\sigma^2 = 4022\).
    • We will see how to handle these distribution objects later in the subject.

Depicting forecasts along with the time series

Important: to depict the forecasts along with the time series use autoplot on the fable object (in our case the variable forecasts) and then specify the original tsibble object within autoplot (in this case bricks). This will result in the forecasts being depicted along the original time series:

# To depict the forecasts and the original series use autoplot() with the 
forecasts %>% 
  
  # Depicts the original time series and the forecasts
  autoplot(bricks, level = FALSE)
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.

If you now wish to add the fitted values, you may use geom_line() to add another layer to the graph:

# To depict the forecasts and the original series use autoplot() with the 
forecasts %>% 
  
  # Depicts the original time series and the forecasts
  autoplot(bricks, level = FALSE) +
    
  # Overlays the fitted values
  geom_line(data = fitted_vals, aes(y = .fitted), colour = "blue", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.

Finally, if you want to include the confidence intervals remove the statement level=FALSE

# To depict the forecasts and the original series use autoplot() with the 
forecasts %>% 
  
  # Depicts the original time series and the forecasts
  autoplot(bricks) +
    
  # Overlays the fitted values
  geom_line(data = fitted_vals, aes(y = .fitted), colour = "blue", linetype = "dashed")

Estimates of the model parameters

Using the function tidy() on a fitted model, we can get the estimates of the parameters of the model. In this case there is only one parameter, the mean of the sample.

tidy(fit)
# A tibble: 1 × 6
  .model term  estimate std.error statistic   p.value
  <chr>  <chr>    <dbl>     <dbl>     <dbl>     <dbl>
1 mean   mean      451.      5.34      84.4 2.58e-121

Naïve model

The Naïve model sets all forecasts to be the value of the last available observation (observation at \(t=T\))

\[ \hat{y}_{T+h|T}=y_T \]

Let us use the aus_exports dataset to illustrate this with an example:

aus_exports <- filter(global_economy, Country == "Australia")
autoplot(aus_exports, Exports)

Defining and training the model (generating the fitted values or estimates)

fit <- aus_exports %>% model(Naive = NAIVE(Exports))
fit
# A mable: 1 x 2
# Key:     Country [1]
  Country     Naive
  <fct>     <model>
1 Australia <NAIVE>

Again the mable has 1 row (we are only fitting the model to 1 time series) and one column for the model (we only fitted one model).

Fitted values

The naïve model produces the fitted values by calculating a one step ahead forecast using the previous observations.

By setting \(h=1\) on the forecasts of the Naive model, particularized at \(t+1\) instead of \(T+1\) to remain in the training region, we obtain the equation for the fitted values:

\[ \hat{y}_{t+1|t}=y_t \]

This gives us the fitted value at \(t+1\). We want the fitted value at \(t\), so we simply shift that equation one time-step to the past accordingly. Wherever we have \(t\), we write \(t-1\) and simplify, obtaining:

\[ \hat{y}_{t|t-1}=y_{t-1} \]

This is the quation for the fitted value at \(t\) in the case of the Naïve model. The fitted value at \(t\) is simply the observation at \(t-1\). Note that, as a result, there will be no fitted value at \(t=1\) for the Naïve model, since there is no previous value.

Let us depict the fitted values for our example:

# Extract fitted values and inspect table
fitted_vals <- fit %>% augment()
head(fitted_vals) 
# A tsibble: 6 x 7 [1Y]
# Key:       Country, .model [1]
  Country   .model  Year Exports .fitted .resid .innov
  <fct>     <chr>  <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
1 Australia Naive   1960    13.0    NA   NA     NA    
2 Australia Naive   1961    12.4    13.0 -0.591 -0.591
3 Australia Naive   1962    13.9    12.4  1.54   1.54 
4 Australia Naive   1963    13.0    13.9 -0.937 -0.937
5 Australia Naive   1964    14.9    13.0  1.93   1.93 
6 Australia Naive   1965    13.2    14.9 -1.72  -1.72 
# Print fitted values along with the original series
fitted_vals %>% 
  filter(.model == "Naive") %>%
  autoplot(Exports, colour = "gray") +
  geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")
Warning: Removed 1 row containing missing values (`geom_line()`).

The above graph looks very similar to the first lag of the time series! In fact, that is exactly what it is.

Forecasts

# produce forecasts for 8 timesteps
forecasts <- fit %>% forecast(h = 8)
forecasts
# A fable: 8 x 5 [1Y]
# Key:     Country, .model [1]
  Country   .model  Year    Exports .mean
  <fct>     <chr>  <dbl>     <dist> <dbl>
1 Australia Naive   2018 N(21, 1.5)  21.3
2 Australia Naive   2019 N(21, 3.1)  21.3
3 Australia Naive   2020 N(21, 4.6)  21.3
4 Australia Naive   2021 N(21, 6.1)  21.3
5 Australia Naive   2022 N(21, 7.6)  21.3
6 Australia Naive   2023 N(21, 9.2)  21.3
7 Australia Naive   2024  N(21, 11)  21.3
8 Australia Naive   2025  N(21, 12)  21.3
# Depict the forecasts
forecasts %>%
  autoplot(aus_exports, level = FALSE) +
  
  # Overlays the fitted values
  geom_line(data = fitted_vals, aes(y = .fitted), colour = "blue", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 1 row containing missing values (`geom_line()`).

As previously explained, unlike the fitted values, the forecasts can extend multiple steps ahead into the future.

Estimates of the model parameters

tidy(fit)
# A tibble: 0 × 7
# ℹ 7 variables: Country <fct>, .model <chr>, term <chr>, estimate <dbl>,
#   std.error <dbl>, statistic <dbl>, p.value <dbl>

As we can see, the Naïve model does not have any parameter. It just forecasts by extending the last observation and does not need to estimate any parameter.

Seasonal naïve

Each forecast equal to the last observed value from the same season one season ago.

  • For example: the same value of the series the same month of the previous year.

Formally, the forecast at time T+h is

\[ \hat{y}_{T+h|T} = y_{T+h-m(k+1)} \]

where:

  • \(m\) is the seasonal period (length in timesteps of one season)
  • \(k\) is the integer part of \((h-1)/m\) (i.e, the number of complete years in the forecast period prior to time T + h)
    • Example: for monthly data (\(m=12\))
      • \(1 \leq h \leq 12\) (forecasts of up to one season ahead)
        • \(0 \leq h-1/m < 1 \rightarrow k = int(\frac{h-1}{m}) = 0\).
        • Since \(k=0 \rightarrow \hat{y}_{T+h|T}=y_{T+h-m(0+1)}=y_{T+h-m}\)
        • i.e.: \(\hat{y}_{T+h|T}=y_{T+h-m}\): the forecast is equal to the matching observation of the last available season in the historical data.
      • \(13 \leq h \leq 24\) (forecasts of up to two sessions ahead)
        • \(1 \leq h-1/m < 2 \rightarrow k = int(\frac{h-1}{m}) = 1\).
        • Since \(k=1 \rightarrow \hat{y}_{T+h|T}=y_{T+h-m(1+1)}=y_{T+h-2m}\)
        • i.e.: \(\hat{y}_{T+h|T}=y_{T+h-2m}\): the forecast is equal to the matching observation of the last available season in the historical data.
      • \(25 \leq h \leq 36\) (forecasts of up to three sessions ahead)
        • \(2 \leq h-1/m < 3 \rightarrow k = int(\frac{h-1}{m}) = 2\).
        • Since \(k=2 \rightarrow \hat{y}_{T+h|T}=y_{T+h-m(2+1)}=y_{T+h-3m}\)
        • i.e.: \(\hat{y}_{T+h|T}=y_{T+h-3m}\): the forecast is equal to the matching observation of the last available season in the historical data.
      • \(37 \leq h \leq 48\) (forecasts of up to four sessions ahead)

The interpretation of the formula is therefore quite simple. In the case of monthly data:

  • all future February values will be equal to the last observed February value.
  • all future March values will be equal to the last observed February value.

This will be better understood with the examples below.

Let us do an example with the us_employment series:

employed <- filter(us_employment, Title == "Total Private", Month >= yearmonth("01-01-2010"))
head(employed)
# A tsibble: 6 x 4 [1M]
# Key:       Series_ID [1]
     Month Series_ID     Title         Employed
     <mth> <chr>         <chr>            <dbl>
1 2010 Jan CEU0500000001 Total Private   105444
2 2010 Feb CEU0500000001 Total Private   105490
3 2010 Mar CEU0500000001 Total Private   106176
4 2010 Apr CEU0500000001 Total Private   107220
5 2010 May CEU0500000001 Total Private   107931
6 2010 Jun CEU0500000001 Total Private   108710
autoplot(employed)
Plot variable not specified, automatically selected `.vars = Employed`

Defining and training the model

fit <- employed %>% model(SNaive = SNAIVE(Employed))
fit
# A mable: 1 x 2
# Key:     Series_ID [1]
  Series_ID       SNaive
  <chr>          <model>
1 CEU0500000001 <SNAIVE>

NOTE: we may use the special function lag to specify the seasonal period the method is to consider. Most of the time this is not necessary, but in case there are multiple possible seasonal periods, you may want to specify which specifically you want SNAIVE() to consider.

# In this case it is not necessary to specify lag
fit <- employed %>% model(SNaive = SNAIVE(Employed ~ lag("year")))
fit
# A mable: 1 x 2
# Key:     Series_ID [1]
  Series_ID       SNaive
  <chr>          <model>
1 CEU0500000001 <SNAIVE>

Fitted values

Because the method generates forecasts by using the corresponding observation from the previous season, the model does not generate fitted values for the first season (remember fitted values are one-step ahead forecasts):

# Extract fitted values and inspect table
fitted_vals <- fit %>% augment()
head(fitted_vals) 
# A tsibble: 6 x 7 [1M]
# Key:       Series_ID, .model [1]
  Series_ID     .model    Month Employed .fitted .resid .innov
  <chr>         <chr>     <mth>    <dbl>   <dbl>  <dbl>  <dbl>
1 CEU0500000001 SNaive 2010 Jan   105444      NA     NA     NA
2 CEU0500000001 SNaive 2010 Feb   105490      NA     NA     NA
3 CEU0500000001 SNaive 2010 Mar   106176      NA     NA     NA
4 CEU0500000001 SNaive 2010 Apr   107220      NA     NA     NA
5 CEU0500000001 SNaive 2010 May   107931      NA     NA     NA
6 CEU0500000001 SNaive 2010 Jun   108710      NA     NA     NA
# Print fitted values along with the original series
fitted_vals %>% 
  filter(.model == "SNaive") %>%
  autoplot(Employed, colour = "gray") +
  geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")
Warning: Removed 12 rows containing missing values (`geom_line()`).

Forecasts

# produce forecasts for 36 timesteps
forecasts <- fit %>% forecast(h = 36)
forecasts
# A fable: 36 x 5 [1M]
# Key:     Series_ID, .model [1]
   Series_ID     .model    Month           Employed  .mean
   <chr>         <chr>     <mth>             <dist>  <dbl>
 1 CEU0500000001 SNaive 2019 Oct N(128001, 5538565) 128001
 2 CEU0500000001 SNaive 2019 Nov N(128415, 5538565) 128415
 3 CEU0500000001 SNaive 2019 Dec N(128363, 5538565) 128363
 4 CEU0500000001 SNaive 2020 Jan N(125932, 5538565) 125932
 5 CEU0500000001 SNaive 2020 Feb N(126370, 5538565) 126370
 6 CEU0500000001 SNaive 2020 Mar N(126994, 5538565) 126994
 7 CEU0500000001 SNaive 2020 Apr N(128007, 5538565) 128007
 8 CEU0500000001 SNaive 2020 May N(128771, 5538565) 128771
 9 CEU0500000001 SNaive 2020 Jun N(129800, 5538565) 129800
10 CEU0500000001 SNaive 2020 Jul N(129883, 5538565) 129883
# ℹ 26 more rows
# Depict the forecasts
forecasts %>%
  autoplot(employed, level = FALSE) +
  autolayer(fitted_vals, .fitted, colour = "blue", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 12 rows containing missing values (`geom_line()`).

Estimates of the model parameters

tidy(fit)
# A tibble: 0 × 7
# ℹ 7 variables: Series_ID <chr>, .model <chr>, term <chr>, estimate <dbl>,
#   std.error <dbl>, statistic <dbl>, p.value <dbl>

As we can see, the seasonal naïve model, just like the naïve model, does not have any parameters

Drift method

The drift method allows forecasts to increase or decrease over time, with the forecasts increasing a specific constant called drift for every unit time.

  • The drift is the amount of change per unit time, set to be the average change in the historical data.

That average change in the historical data will in the end be the slope of the line joining the first and last point in the training data. The figures below explain this in more detail:

The drift method extrapolates this average change, estimated as a slope, into the future to compute \(\hat{y}_{T+h|T}\) simply multiplying this average change by \(h\). In other words, it creates a straight line covering the entire forecast horizon with a slope equal to the average change. Mathematically, it works as follows:

If \(\Delta y_t = y_t - y_{t-1}\) then the equation for the drift method may be written as:

\[ \begin{equation} \hat{y}_{T+h|T} = y_{T} + h \underbrace{\frac{\sum_{t=2}^{t=T}\Delta y_t}{T-1}}_{\substack{\text{average change of } y \\ \text{in the historical data}}} = y_{T} + h \ \frac{\sum_{t=2}^{t=T} (y_{t}-y_{t-1})}{{T-1}} = y_{T} + h \left( \frac{y_{T} -y_{1}}{T-1}\right) \end{equation} \]

Defining and training the model

  • NOTE: RW stands for Random Walk. More about this later in the subject, in the context of ARIMA models.
fit <- employed %>% model(Drift = RW(Employed ~ drift()))
fit
# A mable: 1 x 2
# Key:     Series_ID [1]
  Series_ID             Drift
  <chr>               <model>
1 CEU0500000001 <RW w/ drift>

Equivalent to drawing a line between the first and last observations and extrapolating into the future (the final graph of the example clarifies this further).

Fitted values

# Extract fitted values and inspect table
fitted_vals <- fit %>% augment()
head(fitted_vals) 
# A tsibble: 6 x 7 [1M]
# Key:       Series_ID, .model [1]
  Series_ID     .model    Month Employed .fitted .resid .innov
  <chr>         <chr>     <mth>    <dbl>   <dbl>  <dbl>  <dbl>
1 CEU0500000001 Drift  2010 Jan   105444     NA     NA     NA 
2 CEU0500000001 Drift  2010 Feb   105490 105650.  -160.  -160.
3 CEU0500000001 Drift  2010 Mar   106176 105696.   480.   480.
4 CEU0500000001 Drift  2010 Apr   107220 106382.   838.   838.
5 CEU0500000001 Drift  2010 May   107931 107426.   505.   505.
6 CEU0500000001 Drift  2010 Jun   108710 108137.   573.   573.
# Print fitted values along with the original series
fitted_vals %>% 
  filter(.model == "Drift") %>%
  autoplot(Employed, colour = "gray") +
  geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")
Warning: Removed 1 row containing missing values (`geom_line()`).

The fitted values look very similar to the SNAIVE fitted values. But that is only because they are one step-ahead forecasts. Note that values for the first observations are produced (unlike for the seasoanl naïve) case.

Forecasts

forecasts <- fit %>% forecast(h=8)
forecasts
# A fable: 8 x 5 [1M]
# Key:     Series_ID, .model [1]
  Series_ID     .model    Month           Employed   .mean
  <chr>         <chr>     <mth>             <dist>   <dbl>
1 CEU0500000001 Drift  2019 Oct  N(129518, 764786) 129518.
2 CEU0500000001 Drift  2019 Nov N(129724, 1542645) 129724.
3 CEU0500000001 Drift  2019 Dec N(129929, 2333578) 129929.
4 CEU0500000001 Drift  2020 Jan N(130135, 3137584) 130135.
5 CEU0500000001 Drift  2020 Feb   N(130341, 4e+06) 130341.
6 CEU0500000001 Drift  2020 Mar N(130547, 4784816) 130547.
7 CEU0500000001 Drift  2020 Apr N(130752, 5628041) 130752.
8 CEU0500000001 Drift  2020 May N(130958, 6484340) 130958.
# Extract the initial and final points of the series
# to depict the average change slope.
n_rows = nrow(employed)
drift_points <- tibble(Month = c(employed$Month[1], employed$Month[n_rows]), 
                       Employed = c(employed$Employed[1], employed$Employed[n_rows])) %>%
                as_tsibble()
Using `Month` as index variable.
# Depict the forecasts
forecasts %>%
  autoplot(employed, level = FALSE) +
  autolayer(fitted_vals, .fitted, colour = "blue", linetype = "dashed") + 
  geom_line(drift_points, mapping = aes(x = Month, y = Employed), color = "red", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 1 row containing missing values (`geom_line()`).

In the above graph we can see that:

  1. The forecasts are generated following the slope of the line that connects the initial and final points of the series (red slope). A further example is depicted in the image below:

Further example of the drift method. From [1]
  1. The fitted values are very similar to those of the Naïve Method. Nonetheless, they are not the same. In this case each fitted value is generated by a one-step forecast using the drift method. That is, following the slope resulting from connecting each point of the series with the initial point and extending it one step into the future.

Estimates of the model parameters

We can look at the number of parameters in a model and their values using the function tidy(). If we apply it to the drift model we get:

tidy(fit)
# A tibble: 1 × 7
  Series_ID     .model term  estimate std.error statistic p.value
  <chr>         <chr>  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 CEU0500000001 Drift  b         206.      80.8      2.54  0.0123

The drift model has only one parameter, the slope of the line joining the first and the last observation.

Forecasting with decomposition

If an appropriate decomposition algorithm for the time series has been found, then in many instances it is of insterest to build two separate models:

  1. A model for the seasonally adjusted component
  2. A model for the seasonal component.

Afterwards the forecasts of both models may be combined together to yield the final forecast.

Recall that, if the seasonal component is \(\hat{S_t}\), the seasonally adjusted component \(\hat{A_t}\) of the time series is what results from removing the effect of seasonality from the time series.

The seasonallly adjusted component is computed differently depending on whether the scheme is additive or multiplicative:

\[ \begin{align*} \text{For additive schemes} && \hat{A_t} = y_t - \hat{S_t} \\ \text{For multiplicative schemes} && \hat{A_t} = \frac{y_t}{\hat{S_t}} \\ \end{align*} \]

The decomposed series can be written as:

  • \(y_t = \hat{S}_t + \hat{A}_t\) for additive schemes
    • where \(\hat{A}_t = \hat{T}_t+\hat{R}_{t}\) is the seasonally adjusted component.
  • \(y_t = \hat{S}_t\hat{A}_t\) for multiplicative schemes
    • where \(\hat{A}_t = \hat{T}_t\hat{R}_{t}\) is the seasonally adjusted component.

To forecast a decomposed time series \(\hat{S_t}\) and \(\hat{A_t}\) are forecast separately:

  • \(\hat{A_t}\): any non-seasonal forecasting method may be used (drift method, Holt’s method, ARIMA)…
  • \(\hat{S_t}\): usually assumed to remain unchanged, or to change very slowly. Many times it is forecast by taking the observations of the last season, that is, using simply a seasonal naïve model.

Fitting the model

Let us again resort to the employed time series:

employed <- filter(us_employment, Title == "Total Private", Month >= yearmonth("01-01-2010"))
autoplot(employed)
Plot variable not specified, automatically selected `.vars = Employed`

To fully understand the model specification, let us first look at the outcome of an STL decomposition applied to this series

employed %>% 
  model(
    stl = STL(Employed)
  ) %>% 
  components()
# A dable: 117 x 8 [1M]
# Key:     Series_ID, .model [1]
# :        Employed = trend + season_year + remainder
   Series_ID .model    Month Employed  trend season_year remainder season_adjust
   <chr>     <chr>     <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
 1 CEU05000… stl    2010 Jan   105444 1.07e5      -2016.    272.         107460.
 2 CEU05000… stl    2010 Feb   105490 1.07e5      -1816.     -8.49       107306.
 3 CEU05000… stl    2010 Mar   106176 1.07e5      -1231.    -33.0        107407.
 4 CEU05000… stl    2010 Apr   107220 1.08e5       -361.     14.7        107581.
 5 CEU05000… stl    2010 May   107931 1.08e5        305.    -69.4        107626.
 6 CEU05000… stl    2010 Jun   108710 1.08e5       1006.   -121.         107704.
 7 CEU05000… stl    2010 Jul   108784 1.08e5        916.    -85.9        107868.
 8 CEU05000… stl    2010 Aug   108936 1.08e5        886.    -36.8        108050.
 9 CEU05000… stl    2010 Sep   108568 1.08e5        357.     -9.10       108211.
10 CEU05000… stl    2010 Oct   108984 1.08e5        648.    -17.2        108336.
# ℹ 107 more rows

The seasonal component is stored in a column named season_year and the seasonally adjusted component is stored in a column named season_adjust. To give a full specification to decomposition_model() we need to:

  • Specify a decomposition scheme
  • Provide a model for the seasonally adjusted component.
  • Provide a model for the seasonal component (by default the SNAIVE component is used, so this is not strictly necessary).
fit <- employed %>% 
  model(
    decomp = decomposition_model(
                # Specify the decomposition scheme to be used.
                STL(Employed),
                # Specify a model for the seasonally adjusted component (in this case, a drift).
                RW(season_adjust ~ drift()),
                # Specify a model for the seasonal component (unnecesary, since SNAIVE is the default).
                SNAIVE(season_year)
            )
  )

fit
# A mable: 1 x 2
# Key:     Series_ID [1]
  Series_ID                        decomp
  <chr>                           <model>
1 CEU0500000001 <STL decomposition model>

Fitted values

As usual, you can extract the fitted values using augment():

fit %>% augment()
# A tsibble: 117 x 7 [1M]
# Key:       Series_ID, .model [1]
   Series_ID     .model    Month Employed .fitted .resid .innov
   <chr>         <chr>     <mth>    <dbl>   <dbl>  <dbl>  <dbl>
 1 CEU0500000001 decomp 2010 Jan   105444      NA     NA     NA
 2 CEU0500000001 decomp 2010 Feb   105490      NA     NA     NA
 3 CEU0500000001 decomp 2010 Mar   106176      NA     NA     NA
 4 CEU0500000001 decomp 2010 Apr   107220      NA     NA     NA
 5 CEU0500000001 decomp 2010 May   107931      NA     NA     NA
 6 CEU0500000001 decomp 2010 Jun   108710      NA     NA     NA
 7 CEU0500000001 decomp 2010 Jul   108784      NA     NA     NA
 8 CEU0500000001 decomp 2010 Aug   108936      NA     NA     NA
 9 CEU0500000001 decomp 2010 Sep   108568      NA     NA     NA
10 CEU0500000001 decomp 2010 Oct   108984      NA     NA     NA
# ℹ 107 more rows

QUESTION: Why are the first rows of the fitted values NAs??

Forecasts

fit %>%
  forecast() %>%
  autoplot(employed, level = FALSE) +
  labs(y = "Number of people",
       title = "US retail employment")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.

fit
# A mable: 1 x 2
# Key:     Series_ID [1]
  Series_ID                        decomp
  <chr>                           <model>
1 CEU0500000001 <STL decomposition model>

The graph above shows that the model combines the SNAIVE + the DRIFT forecasts (this is equivalent to adding the slope of the drift model to the seasonal naive model).

Estimates of the model parameters

When using the function tidy() on the decomposition model, we get the following error:

no applicable method for 'tidy' applied to an object of class "c('decomposition_model', 'model_combination')

tidy(fit)
Error in `mutate()`:
ℹ In argument: `dplyr::across(all_of(mbl_vars), function(x) lapply(x,
  tidy, ...))`.
Caused by error in `across()`:
! Can't compute column `decomp`.
Caused by error in `UseMethod()`:
! no applicable method for 'tidy' applied to an object of class "c('decomposition_model', 'model_combination')"

What this means is that decomposition_model is actually the combination of two models and the function tidy() is not programmed to deal with this. To get estimates of the model parameters, we should perform the decomposition of the model as defined and then fit the models separately to the seasonally adjusted component and the seasonal component. Inspecting each of those objects would yield the model estimates.

# Create decompositio model
dcmp <- 
  employed %>% 
  model(
    stl = STL(Employed)
  ) %>% 
  components()

# Model for the seasonally adjusted data.
# Note that season_adjust is a column within dcmp
fit_seasonadj <- 
  dcmp %>% 
  select(season_adjust) %>% 
  model(
    drift = RW(season_adjust ~ drift())
  )

# Model for the seasonal data.
# Note that season_year is a column within dcmp
fit_seasonal <- 
  dcmp %>%
  select(season_year) %>% 
  model(
    snaive = SNAIVE(season_year)
  )

# Parameters of drift model
tidy(fit_seasonadj)
# A tibble: 1 × 6
  .model term  estimate std.error statistic  p.value
  <chr>  <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 drift  b         186.      8.66      21.5 4.48e-42
# Estimates of parameters for snaive model
tidy(fit_seasonal)
# A tibble: 0 × 6
# ℹ 6 variables: .model <chr>, term <chr>, estimate <dbl>, std.error <dbl>,
#   statistic <dbl>, p.value <dbl>

Exercise 1

For this exercise we are going to use the aus_retail dataset. Remember you can always use ?aus_retail in the console to red the details abut the dataset.

The dataset contains a variety of series that that might be filtered using their specific series ID. We will be working the the following Series ID: A3349767W. The following code filters the series for you.

retail_series <- aus_retail %>%
  filter(`Series ID` == "A3349767W")

head(retail_series)
# A tsibble: 6 x 5 [1M]
# Key:       State, Industry [1]
  State              Industry                      `Series ID`    Month Turnover
  <chr>              <chr>                         <chr>          <mth>    <dbl>
1 Northern Territory Clothing, footwear and perso… A3349767W   1988 Apr      2.3
2 Northern Territory Clothing, footwear and perso… A3349767W   1988 May      2.9
3 Northern Territory Clothing, footwear and perso… A3349767W   1988 Jun      2.6
4 Northern Territory Clothing, footwear and perso… A3349767W   1988 Jul      2.8
5 Northern Territory Clothing, footwear and perso… A3349767W   1988 Aug      2.9
6 Northern Territory Clothing, footwear and perso… A3349767W   1988 Sep      3  

1. Use an STL decompostition to calculate the trend-cycle and seasonal indices. If you deem it necessary, adjust the trend and season windows (solved)

NOTE: remember that the STL decomposition is only applicable to additive schemes, which means you need to transform the data first to obtain an additive scheme. To pick the best transformation, I am going to use the guerrero feature, something we will see in the following session. For now take this analysis as a given (I leave the analysis so that you understand it later):

retail_series %>% autoplot()
Plot variable not specified, automatically selected `.vars = Turnover`

# let us examine the guerrero feature to see which box-cox transformation would
# be more appropriate.
lambda <- retail_series %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

lambda
[1] 0.08303631

This menas that a log transformation should be applied. In this particular case we are not going to adjust the STL windows. We are going to specify the log transformation within the model formula. We will see later that this is important.

dcmp <- retail_series %>%
        model(stl = STL(log(Turnover))) %>%
        components()

dcmp %>% autoplot()

Now, having this decomposition, proceed to carry out the rest of the exercise.

2. Extract the seasonally adjusted data from dcmp using select and store it in a separate tsibble called season_adjust. Plot the seasonally adjusted data

3. Use the drift method to produce forcasts of the seasonally adjusted data. Generate a graph with the tieme series, the fitted values and the forecasts

4. Extract the seasonal component from dcmp using select and store it in a separate tsibble called seasonal. Use the SNAIVE method to produce forecasts of the seasonal component. Generate a graph with the time series, the fitted values and the forecasts

5. Now use decomposition_model() on the original time seres. Use the drift method for the seasonally adjusted data and the seasonal naïve method for the seaxonal component.